5.5 EMP_impute

Data imputation is estimating or populating missing values in a dataset by an algorithm when those missing values exist. The goal is to provide reasonable estimates of where missing values are located so that these data can be better utilized in subsequent data analysis or modeling. The module EMP_impute performs data interpolation based on the Chained Random Forests (CRF) algorithm. This algorithm is more scientific than the traditional mean or mode interpolation method. It can be used not only to impute continuous variables but also to impute categorical variables. ModuleEMP_impute supports data interpolation for assay, rowdata, and coldata.

Coldata of the project refers to sample-related data, which often have different degrees of missing values. Common causes of missing values include errors or omissions in the data collection process, withdrawal of subjects from the study, technical problems (e.g., equipment failure or data transmission errors), etc. The module EMP_impute can impute missing values of coldata based on the CRF algorithm.

🏷️Example:

Before imputation: coldata has lots of missing values.

MAE |>
  EMP_assay_extract('geno_ec') |>
  EMP_coldata_extract()

After imputation: impute all missing values of coldata.

MAE |>
  EMP_assay_extract('geno_ec') |>
  EMP_impute() |>
  EMP_coldata_extract()

Users can also impute only partial missing values of coldata. For example, only missing values for PHQ9 and GAD7 are imputed.

MAE |>
  EMP_assay_extract('geno_ec') |>
  EMP_impute(.formula = PHQ9+GAD7 ~ .) |>
  EMP_coldata_extract()

5.5.2 Impute assay (Experimental data)

When there are no missing values in the assay, imputation using the module EMP_impute prompts "Assay data has no NA value! ".

Rowdata of the project refers to feature-related data, and its missing values are mainly caused by imperfect database annotations. Therefore, although the module EMP_impute supports the interpolation of rowdata using the parameterrowdata=T, the interpolation result is difficult to meet the actual needs, so it is not recommended to impute rowdata.

5.5.4 Imputation of the entire multi-omics object's coldata

Users can also directly perform phenotype data imputation on the entire multi-omics data container. It should be noted that in this case, the output object will be a MultiAssayExperiment.

MAE |>
  EMP_impute()

Copyright © 382983280@qq.com 2024 all right reserved,powered by Gitbook更新时间: 2025-04-16 02:48:31

results matching ""

    No results matching ""